Data

Example instances. Replace by 10x10 tile plot shoring 10 instance of 1 number for every row

Data Exploration

Fairly even distribution. Can also try random sampling instead of head(10000) at the top.

The majority of points are either 0 (white) or 255 (black). Most values are not useful. Dimensionality reduction

## `summarise()` regrouping output by 'x', 'y' (override with `.groups` argument)

Average numbers representations. Gives a good idea of variability

## Loading required package: lattice
## Warning: package 'lattice' was built under R version 3.6.2
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift

Data Processing

Dimensionality Reduction

Near Zero Variance

PCA

Modeling

## Warning: The `i` argument of ``[`()` can't be a matrix as of tibble 3.0.0.
## Convert to a vector.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.

## Warning: Setting row names on a tibble is deprecated.
## k-Nearest Neighbors 
## 
## 4001 samples
##  784 predictor
##   10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' 
## 
## No pre-processing
## Resampling: Cross-Validated (3 fold) 
## Summary of sample sizes: 2667, 2668, 2667 
## Resampling results across tuning parameters:
## 
##   k  Accuracy   Kappa    
##   1  0.9305179  0.9227232
##   2  0.9162703  0.9068747
##   3  0.9240183  0.9154920
##   4  0.9212706  0.9124318
##   5  0.9200192  0.9110354
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 1.

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 103   0   0   0   0   0   0   0   1   2
##          1   0  99   2   1   2   1   1   1   2   1
##          2   0   3  92   1   0   0   1   0   0   0
##          3   0   0   1  85   0   2   0   0   3   0
##          4   0   0   0   0 108   0   0   1   0   5
##          5   0   0   0   3   0  80   0   0   3   1
##          6   2   0   0   1   1   2 102   0   1   1
##          7   0   0   4   2   0   0   0 102   0   3
##          8   0   0   0   0   0   0   0   0  76   0
##          9   0   0   0   0   1   0   0   4   0  92
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9399          
##                  95% CI : (0.9234, 0.9539)
##     No Information Rate : 0.1121          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9332          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.9810   0.9706  0.92929  0.91398   0.9643  0.94118
## Specificity            0.9966   0.9877  0.99444  0.99338   0.9932  0.99234
## Pos Pred Value         0.9717   0.9000  0.94845  0.93407   0.9474  0.91954
## Neg Pred Value         0.9978   0.9966  0.99224  0.99119   0.9955  0.99452
## Prevalence             0.1051   0.1021  0.09910  0.09309   0.1121  0.08509
## Detection Rate         0.1031   0.0991  0.09209  0.08509   0.1081  0.08008
## Detection Prevalence   0.1061   0.1101  0.09710  0.09109   0.1141  0.08709
## Balanced Accuracy      0.9888   0.9792  0.96187  0.95368   0.9788  0.96676
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity            0.9808   0.9444  0.88372  0.87619
## Specificity            0.9911   0.9899  1.00000  0.99441
## Pos Pred Value         0.9273   0.9189  1.00000  0.94845
## Neg Pred Value         0.9978   0.9932  0.98917  0.98559
## Prevalence             0.1041   0.1081  0.08609  0.10511
## Detection Rate         0.1021   0.1021  0.07608  0.09209
## Detection Prevalence   0.1101   0.1111  0.07608  0.09710
## Balanced Accuracy      0.9859   0.9672  0.94186  0.93530